Individual Assignment

take a try

Li Yumeng https://nicetry.netlify.app/
07-11-2021

VAST Challenge 2021 - Mini-Challenge 2

Many of the Abila, Kronos-based employees of GAStech have company cars which are approved for both personal and business use. Those who do not have company cars have the ability to check out company trucks for business use, but these trucks cannot be used for personal business.

Employees with company cars are happy to have these vehicles, because the company cars are generally much higher quality than the cars they would be able to afford otherwise. However, GAStech does not trust their employees. Without the employees? knowledge, GAStech has installed geospatial tracking software in the company vehicles. The vehicles are tracked periodically as long as they are moving.

This vehicle tracking data has been made available to law enforcement to support their investigation. Unfortunately, data is not available for the day the GAStech employees went missing. Data is only available for the two weeks prior to the disappearance.

To promote local businesses, Kronos based companies provide a Kronos Kares benefit card to GASTech employees giving them discounts and rewards in exchange for collecting information about their credit card purchases and preferences as recorded on loyalty cards. This data has been made available to investigators in the hopes that it can help resolve the situation. However, Kronos Kares does not collect personal information beyond purchases.

As a visual analytics expert assisting law enforcement, your mission is to identify which GASTech employees made which purchases and identify suspicious patterns of behavior. You must cope with uncertainties that result from missing, conflicting, and imperfect data to make recommendations for further investigation.

Use visual analytics to analyze the available data and develop responses to the questions below. In addition, prepare a video that shows how you used visual analytics to solve this challenge. Submission instructions are available here. Entry forms are available for download below.

Note:

Set R Studio environment and installed needed packages

packages = c('igraph', 'tidygraph', 'ggraph', 'visNetwork', 'lubridate', 'clock', 'tidyverse','dplyr', 'tidyr','raster','sf','sp','tmap', 'gifski','mapview','writexl')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

Input and read dataset

credit_card <- read.csv("data/cc_data.csv")
glimpse(credit_card)
Rows: 1,490
Columns: 4
$ timestamp  <chr> "1/6/2014 7:28", "1/6/2014 7:34", "1/6/2014 7:35"~
$ location   <chr> "Brew've Been Served", "Hallowed Grounds", "Brew'~
$ price      <dbl> 11.34, 52.22, 8.33, 16.72, 4.24, 4.17, 28.73, 9.6~
$ last4ccnum <int> 4795, 7108, 6816, 9617, 7384, 5368, 7253, 4948, 9~
loyalty_card <- read.csv("data/loyalty_data.csv")
glimpse(loyalty_card)
Rows: 1,392
Columns: 4
$ timestamp  <chr> "1/6/2014", "1/6/2014", "1/6/2014", "1/6/2014", "~
$ location   <chr> "Brew've Been Served", "Brew've Been Served", "Ha~
$ price      <dbl> 4.17, 9.60, 16.53, 11.51, 12.93, 4.27, 11.20, 15.~
$ loyaltynum <chr> "L2247", "L9406", "L8328", "L6417", "L1107", "L40~

Data preparation process

Change data type

Cause data type of credit card is character which is not correct, so we need to change to date time data type.

credit_card$timestamp <- date_time_parse(credit_card$timestamp,
                zone = "",
                format = "%m/%d/%Y %H:%M")
glimpse(credit_card)
Rows: 1,490
Columns: 4
$ timestamp  <dttm> 2014-01-06 07:28:00, 2014-01-06 07:34:00, 2014-0~
$ location   <chr> "Brew've Been Served", "Hallowed Grounds", "Brew'~
$ price      <dbl> 11.34, 52.22, 8.33, 16.72, 4.24, 4.17, 28.73, 9.6~
$ last4ccnum <int> 4795, 7108, 6816, 9617, 7384, 5368, 7253, 4948, 9~

Change loyalty card for the same data type problem.

loyalty_card$timestamp <- date_time_parse(loyalty_card$timestamp,
                zone = "",
                format = "%m/%d/%Y")
glimpse(loyalty_card)
Rows: 1,392
Columns: 4
$ timestamp  <dttm> 2014-01-06, 2014-01-06, 2014-01-06, 2014-01-06, ~
$ location   <chr> "Brew've Been Served", "Brew've Been Served", "Ha~
$ price      <dbl> 4.17, 9.60, 16.53, 11.51, 12.93, 4.27, 11.20, 15.~
$ loyaltynum <chr> "L2247", "L9406", "L8328", "L6417", "L1107", "L40~

Change date data to the same form for credit card and loyalty card and join them together

credit_card$Date <- format(credit_card$timestamp, format = "%Y-%m-%d")
credit_card$Date <- date_time_parse(credit_card$Date, zone = "", format = "%Y-%m-%d")
glimpse(credit_card)
Rows: 1,490
Columns: 5
$ timestamp  <dttm> 2014-01-06 07:28:00, 2014-01-06 07:34:00, 2014-0~
$ location   <chr> "Brew've Been Served", "Hallowed Grounds", "Brew'~
$ price      <dbl> 11.34, 52.22, 8.33, 16.72, 4.24, 4.17, 28.73, 9.6~
$ last4ccnum <int> 4795, 7108, 6816, 9617, 7384, 5368, 7253, 4948, 9~
$ Date       <dttm> 2014-01-06, 2014-01-06, 2014-01-06, 2014-01-06, ~
card_joined <- credit_card %>%
  full_join(loyalty_card, by = c("Date" = "timestamp", "location", "price"))

Count transaction time of credit card & loyalty card and sort by descding order for each

popular_credit_card <- credit_card %>%
  group_by(location) %>%
  summarise(count = n()) %>%
  arrange(desc(count))
popular_loyalty_card <- loyalty_card %>%
  group_by(location) %>%
  summarise(count = n()) %>%
  arrange(desc(count))
popular_locations <- card_joined %>%
  filter(location %in% c("Katerina's Cafe", "Hippokampos", "Guy's Gyros", "Brew've Been Served", "Ouzeri Elian", "Hallowed Grounds")) %>%
  drop_na(timestamp) %>%
  dplyr::select(-Date)
popular_top_credit <- popular_credit_card %>%
 
 gather(location, count) %>%
 arrange(desc(count)) %>%
 top_n(6)

popular_top_credit
# A tibble: 6 x 2
  location            count
  <chr>               <int>
1 Katerina's Cafe       212
2 Hippokampos           171
3 Guy's Gyros           158
4 Brew've Been Served   156
5 Hallowed Grounds       92
6 Ouzeri Elian           87
top6_credit<-ggplot(data=popular_top_credit, aes(x=location, y=count)) +
  geom_bar(stat="identity", fill="steelblue")+
  theme_minimal()
top6_credit

popular_top_loyal <- popular_loyalty_card %>%
 
 gather(location, count) %>%
 arrange(desc(count)) %>%
 top_n(6)

popular_top_loyal
# A tibble: 6 x 2
  location            count
  <chr>               <int>
1 Katerina's Cafe       195
2 Hippokampos           155
3 Guy's Gyros           146
4 Brew've Been Served   140
5 Ouzeri Elian           84
6 Hallowed Grounds       80
top6_loyal<-ggplot(data=popular_top_loyal, aes(x=location, y=count)) +
  geom_bar(stat="identity", fill="pink")+
  theme_minimal()
top6_loyal

abnormal_credit_card <- popular_locations %>%
  drop_na(loyaltynum) %>%
  group_by(last4ccnum) %>%
  summarize(loy_n = n_distinct(loyaltynum)) %>%
  filter(loy_n > 1)

abnormal_credit_card
# A tibble: 7 x 2
  last4ccnum loy_n
       <int> <int>
1       1286     2
2       4795     2
3       4948     2
4       5368     2
5       5921     2
6       7889     2
7       8332     2

Observation

The most popular locations are as below:

  1. Katerina’s Cafe
  2. Hippokampos
  3. Guy’s Gyros
  4. Brew’ve Been Served
  5. Ouzeri Elian
  6. Hallowed Grounds

What’s more i found something abnormal. Reasonably one credit card transaction should relate to only one loyalty card record for the same amount value, which means who consumed the money and would get the same value of point collection for loyalty card. We can filter out those credit card which have more than one loyalty card record. We can see that those last4ccnum has anomalies being observed, which are 1286, 4795, 4948, 5368, 5921, 7889, and 8332. I will pay more attention to these card when mapping later.

2. Combine the vehicle data to the analysis of the credit and loyalty card Obseravation

packages = c('raster', 'sf', 
             'tmap', 'clock', 
             'tidyverse')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

Importing Raster file

bgmap <- raster("data/MC2-tourist.tif")
bgmap
class      : RasterLayer 
band       : 1  (of  3  bands)
dimensions : 1595, 2706, 4316070  (nrow, ncol, ncell)
resolution : 3.16216e-05, 3.16216e-05  (x, y)
extent     : 24.82419, 24.90976, 36.04499, 36.09543  (xmin, xmax, ymin, ymax)
crs        : +proj=longlat +datum=WGS84 +no_defs 
source     : MC2-tourist.tif 
names      : MC2.tourist 
values     : 0, 255  (min, max)

Plotting Raster Layer

tmap_mode("plot")
tm_shape(bgmap) +
    tm_raster(bgmap,
              legend.show = FALSE)

tm_shape(bgmap) +
    tm_rgb(bgmap,r=1,g=2,b=3,
           alpha = NA,
           saturation = 1,
           interpolate = TRUE,
           max.value = 255)

Importing Vector GIS Data File

Abila_st <- st_read(dsn = "data/Geospatial",
                    layer = "Abila")
Reading layer `Abila' from data source 
  `D:\LLLEMON21\DataViz_blog\_posts\Individual Assignment\data\Geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 3290 features and 9 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 24.82401 ymin: 36.04502 xmax: 24.90997 ymax: 36.09492
Geodetic CRS:  WGS 84

Calculate new column for time interval to get the last timing point when driver used vehicle again, and named it as “gps2.csv” file

gps2 <- read_csv("data/gps2.csv")
glimpse(gps2)
Rows: 685,169
Columns: 6
$ Timestamp         <chr> "1/6/2014 7:20", "1/6/2014 7:20", "1/6/201~
$ id                <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
$ lat               <dbl> 36.06646, 36.06634, 36.06615, 36.06613, 36~
$ long              <dbl> 24.88258, 24.88259, 24.88258, 24.88258, 24~
$ `Time Difference` <time>       NA, 00:02:00, 00:03:00, 00:01:00, 0~
$ Seconds           <dbl> 0, 2, 3, 1, 3, 1, 1, 1, 4, 1, 1, 2, 3, 1, ~

Converting Date-Time field

gps2$Timestamp <- date_time_parse(gps2$Timestamp,
                zone = "",
                format = "%m/%d/%Y %H:%M")

change data type of “id” to the “id” form to correctly show

gps2$id <- as_factor(gps2$id)

Converting Aspatial Data into a Simple Feature Data Frame

gps_sf <- st_as_sf(gps2, 
                   coords = c("long", "lat"), 
                   crs= 4326)

Add more columns to get features that help to analyze

gps_sf$day <- format(gps_sf$Timestamp, format="%d")
gps_sf$hour <- format(gps_sf$Timestamp, format="%H")
gps_sf$minute <- format(gps_sf$Timestamp, format="%M")

Manually filter those gps record which had more than 5 minutes time interval

more_than_5mins <- gps_sf %>%
  filter(Seconds > 300)

Plotting the gps Paths

gps_path <- gps_sf %>%
  group_by(id, hour, day, minute) %>%
  summarize(m = mean(Timestamp), 
            do_union=FALSE) %>%
  st_cast("LINESTRING")

Figure out gps path which only got one record and only select gps records which are more than 1, cause we should get path which having one more gps record.

p = npts(gps_path, by_feature = TRUE)
gps_path2 <- cbind(gps_path, p)
gps_path2 <- gps_path2 %>%
  filter(p>1)

Creating animated map with tmap_animation() function

Draw the plot graph using dot plot

gps_dot <- more_than_5mins %>%
  group_by(id, hour, day, minute) %>%
  summarize(geo_n = n_distinct(geometry)) %>%
  st_cast("POINT")
card_selected <- card_joined %>%
  filter(last4ccnum == 9735)

I filter day, hour and minute for specific time spot and try to match with credit card purchasing record. get the matched credit card number, loyalty number and car ID

Take the graph below as an example, after getting the gps dot plot then I get the gps record on the graph. then check with credit card ID purchasing record.The purchasing record a bit earlier than the gps record which can guess this person drive away after paying by credit card.

3. Infer the owners of each credit card and loyalty card

car <- read_csv("data/car-assignments.csv")
glimpse(car)
Rows: 44
Columns: 5
$ LastName               <chr> "Calixto", "Azada", "Balas", "Barranc~
$ FirstName              <chr> "Nils", "Lars", "Felix", "Ingrid", "I~
$ CarID                  <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12~
$ CurrentEmploymentType  <chr> "Information Technology", "Engineerin~
$ CurrentEmploymentTitle <chr> "IT Helpdesk", "Engineer", "Engineer"~
car <- car %>%
  drop_na(CarID)

car$CarID <- as_factor(car$CarID)

glimpse(car)
Rows: 35
Columns: 5
$ LastName               <chr> "Calixto", "Azada", "Balas", "Barranc~
$ FirstName              <chr> "Nils", "Lars", "Felix", "Ingrid", "I~
$ CarID                  <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12~
$ CurrentEmploymentType  <chr> "Information Technology", "Engineerin~
$ CurrentEmploymentTitle <chr> "IT Helpdesk", "Engineer", "Engineer"~
glimpse(gps2)
Rows: 685,169
Columns: 6
$ Timestamp         <dttm> 2014-01-06 07:20:00, 2014-01-06 07:20:00,~
$ id                <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
$ lat               <dbl> 36.06646, 36.06634, 36.06615, 36.06613, 36~
$ long              <dbl> 24.88258, 24.88259, 24.88258, 24.88258, 24~
$ `Time Difference` <time>       NA, 00:02:00, 00:03:00, 00:01:00, 0~
$ Seconds           <dbl> 0, 2, 3, 1, 3, 1, 1, 1, 4, 1, 1, 2, 3, 1, ~
car_gps <- car %>%
  full_join(gps2, by = c("CarID" = "id"))
car_gps <- st_as_sf(car_gps, 
                   coords = c("long", "lat"),
                       crs= 4326)

car_gps <- car_gps %>%
  unite("Name", FirstName, LastName, sep = " ")
abnormal_cc_match <- read.csv("data/abnormal_cc.csv")

abnormal_cc_match
  ï..CC_number Loyalty_number Car_ID               Name
1         1286          L3572     22      Adra Nubarron
2         1286          L3288     22      Adra Nubarron
3         4795          L8566     34        Edvard Vann
4         4948          L3295     18    Birgitta Frente
5         5921          L9406     29      Bertrand Ovan
6         5921          L3295     29      Bertrand Ovan
7         7889          L6119      8      Lucas Alcazar
8         7889          L2247 8/22/6                  -
9         8332          L2070     10 Ada Campo-Corrente
   CurrentEmploymentType   CurrentEmploymentTitle
1               Security           Badging Office
2               Security           Badging Office
3               Security        Perimeter Control
4            Engineering                Geologist
5             Facilities Facilities Group Manager
6             Facilities Facilities Group Manager
7 Information Technology            IT Technician
8                      -                        -
9              Executive                  SVP/CIO
total_match <- read.csv("data/total_match.csv")

total_match
   ï..CC_number Loyalty_number Car_ID              Name
1          9551          L5777      1      Nils Calixto
2          1415          L7783      2        Lars Azada
3          9635          L3191      3       Felix Balas
4          7688          L4164      4   Ingrid Barranco
5          6899          L6267      5         Isak Baza
6          7253          L1682      6     Linnea Bergen
7          2540          L5947      7       Elsa Orilla
8          1877          L3014      9      Gustav Cazar
9          1311          L4149     11       Axel Calzas
10         7108          L6544     12   Hideki Cocinaro
11         5407          L4034     13        Inga Ferro
12         7819          L5259     13        Inga Ferro
13         1874          L4424     14     Lidelse Dedos
14         3853          L1485     15    Loreto Bodrogi
15         7354          L9254     16         Isia Vann
16         7384          L3800     17       Sven Flecha
17         9617          L5553     18   Birgitta Frente
18         2418          L9018     19       Vira Frente
19         6895          L3366     19       Vira Frente
20         6816          L8148     20      Stenig Fusil
21         9405          L3259     21    Hennie Osvaldo
22         3484          L2490     23       Varja Lagos
23         4434          L2169     24        Minke Mies
24         8202          L2343     24        Minke Mies
25         2142          L9637     25     Kanon Herrero
26         1310          L8012     26        Marin Onda
27         2681          L1107     27       Kare Orilla
28         3492          L7814     27       Kare Orilla
29         9241          L3288     28   Isande Borrasca
30         3547          L9362     29     Bertrand Ovan
31         6691          L6267     29     Bertrand Ovan
32         6901          L9363     30     Felix Resumir
33         5010          L2459     31 Sten Sanjorge Jr.
34         8156          L5224     32       Orhan Strum
35         9683          L7291     33   Brand Tempestad
36         2463          L6886     35 Willem Vasco-Pais
37         3506          L7761    101              #N/A
38         9220          L4063    101              #N/A
39         9614          L5924    101              #N/A
40         8642          L2769    104              #N/A
41         7792          L5756    105              #N/A
42         2276          L3317    106              #N/A
43         4530          L8477    107              #N/A
44         9735          L9633    107              #N/A
    CurrentEmploymentType       CurrentEmploymentTitle
1  Information Technology                  IT Helpdesk
2             Engineering                     Engineer
3             Engineering                     Engineer
4               Executive                      SVP/CFO
5  Information Technology                IT Technician
6  Information Technology             IT Group Manager
7             Engineering             Drill Technician
8             Engineering             Drill Technician
9             Engineering         Hydraulic Technician
10               Security                 Site Control
11               Security                 Site Control
12               Security                 Site Control
13            Engineering    Engineering Group Manager
14               Security                 Site Control
15               Security            Perimeter Control
16 Information Technology                IT Technician
17            Engineering                    Geologist
18            Engineering         Hydraulic Technician
19            Engineering         Hydraulic Technician
20               Security             Building Control
21               Security            Perimeter Control
22               Security               Badging Office
23               Security            Perimeter Control
24               Security            Perimeter Control
25            Engineering                    Geologist
26            Engineering           Drill Site Manager
27            Engineering             Drill Technician
28            Engineering             Drill Technician
29            Engineering             Drill Technician
30             Facilities     Facilities Group Manager
31             Facilities     Facilities Group Manager
32               Security       Security Group Manager
33              Executive                President/CEO
34              Executive                      SVP/COO
35            Engineering             Drill Technician
36              Executive Environmental Safety Advisor
37                   #N/A                         #N/A
38                   #N/A                         #N/A
39                   #N/A                         #N/A
40                   #N/A                         #N/A
41                   #N/A                         #N/A
42                   #N/A                         #N/A
43                   #N/A                         #N/A
44                   #N/A                         #N/A

I combined with car assignment file and inferred the owners of each credit card and loyalty card, as shown above.

4. Identify potential informal or unofficial relationships among GASTech personnel

i can see from the path above, these three people have the common path very often so gathering a lot, what’s more they often get coffee time in the morning and they are in the same security department, so i guess they have relative close relationship and get along quite well.

these two people are really skeptical cause they went to hotel many times during working time.

5. Identify suspicious activity and locations where the suspicious activity is occurring

i filter out the gps path which got record at 2am, 3am, and 4am. then i found these car ID, 21, 24, 15 got record in this time period which is very skeptical, and the abnormal place are Abila Scrapyard and GAS Tech, i am not sure if they need work that late in the company and cannot figure out the reason why they still outside in the midnight.